-
Notifications
You must be signed in to change notification settings - Fork 67
Fallback to CPU when video isn't supported by NVDEC #977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ceive-frame-send-packet
…xu/torchcodec into refactor-receive-frame-send-packet
| return decoder->frameReadyInDisplayOrder(dispInfo); | ||
| } | ||
|
|
||
| static UniqueCUvideodecoder createDecoder(CUVIDEOFORMAT* videoFormat) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All checks below have been moved to the new bool nativeNVDECSupport(const SharedAVCodecContext& codecContext);
| return std::nullopt; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the same checks as before, but with a crucial difference: the source of truth that we use to get the stream properties is now FFmpeg's AVCodecContext. Previously, it was NVCUVID's CUVIDEOFORMAT.
I think that's fine. I can't imagine NVCUVID being more accurate than FFmpeg.
| } | ||
| return; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above is essentially the same as the FFmpeg interface logic:
torchcodec/src/torchcodec/_core/CudaDeviceInterface.cpp
Lines 250 to 290 in 3827dfe
| if (avFrame->format != AV_PIX_FMT_CUDA) { | |
| // The frame's format is AV_PIX_FMT_CUDA if and only if its content is on | |
| // the GPU. In this branch, the frame is on the CPU. There are two possible | |
| // reasons: | |
| // | |
| // 1. During maybeConvertAVFrameToNV12OrRGB24(), we had a non-NV12 format | |
| // frame and we're on FFmpeg 4.4 or earlier. In such cases, we had to | |
| // use CPU filters and we just converted the frame to RGB24. | |
| // 2. This is what NVDEC gave us if it wasn't able to decode a frame, for | |
| // whatever reason. Typically that happens if the video's encoder isn't | |
| // supported by NVDEC. | |
| // | |
| // In both cases, we have a frame on the CPU. We send the frame back to the | |
| // CUDA device when we're done. | |
| enum AVPixelFormat frameFormat = | |
| static_cast<enum AVPixelFormat>(avFrame->format); | |
| FrameOutput cpuFrameOutput; | |
| if (frameFormat == AV_PIX_FMT_RGB24) { | |
| // Reason 1 above. The frame is already in RGB24, we just need to convert | |
| // it to a tensor. | |
| cpuFrameOutput.data = rgbAVFrameToTensor(avFrame); | |
| } else { | |
| // Reason 2 above. We need to do a full conversion which requires an | |
| // actual CPU device. | |
| cpuInterface_->convertAVFrameToFrameOutput(avFrame, cpuFrameOutput); | |
| } | |
| // Finally, we need to send the frame back to the GPU. Note that the | |
| // pre-allocated tensor is on the GPU, so we can't send that to the CPU | |
| // device interface. We copy it over here. | |
| if (preAllocatedOutputTensor.has_value()) { | |
| preAllocatedOutputTensor.value().copy_(cpuFrameOutput.data); | |
| frameOutput.data = preAllocatedOutputTensor.value(); | |
| } else { | |
| frameOutput.data = cpuFrameOutput.data.to(device_); | |
| } | |
| return; | |
| } |
There are still minor differences that we should eventually align, as part of #943
| # https://github.com/meta-pytorch/torchcodec/pull/977 | ||
| # We need to define a better way to assert which backend a decoder | ||
| # is using. | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK to deactivate this test for now, we know it's working fine.
I think we should have a private way to query the VideoDecoder object so that it can tell us which interace / backend it's currently using. I will follow-up on this.
| dec.get_frame_at(0) | ||
| beta = VideoDecoder(H265_VIDEO.path, device="cuda").get_frame_at(0) | ||
|
|
||
| torch.testing.assert_close(ffmpeg.data, beta.data, rtol=0, atol=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above: we used to assert an error, now we just assert that the video can be decoded properly. Strictly speaking, we're not asserting that there is a fallback. (That will probably come in follow-ups).
There's another important thing to check, that we're actually not enforcing in the tests: we need to make sure that we don't have false positives, i.e. that we're not falling back to the CPU when we don't need to. To check that, I added a TORCH_CHECK(false) in the CPU fallback path. All the tests are passing, except for these two:
FAILED test/test_decoders.py::TestVideoDecoder::test_get_key_frame_indices[cuda:beta] - RuntimeError: Falling back to CPU decoder.
FAILED test/test_decoders.py::TestVideoDecoder::test_beta_cuda_interface_cpu_fallback - RuntimeError: Falling back to CPU decoder.
This is the expected result: these two tests are using H265_VIDEO, which is the only video that needs a fallback.
I will try to follow-up with a an official and robust way to assert the backend / fallback of the VideoDecoder object (see other comment below)
| // We'll always use the CPU fallback from now on, so we can return early. | ||
| return; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above: we create the fallback CPU interface only if we need it.
I'm a bit sad that the BETA Interface needs to store a CPU interface. I have TODOs in other places basically saying "An interface shouldn't rely on another interface".
But, I'm not sure we have a better way to implement this fallback mechanism, so I think our principle should be something less strict along those lines: "It's OK for interface A to use interface B, but interface B shouldn't need to worry about being called by interface A". I think we're respecting this principle so far.
The main design alternative would be for the fallback to be implemented within SingleStreamDecoder rather than in the interface itself. But I'm not sure we want this.
Specifically for this fallabck, we'd have to create a third hybrid interface (the fallback one), that runs all the decoding on the CPU, but runs the color conversion (convertAVFrameToFrameOutput) on GPU. Sounds like a hassle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"It's OK for interface A to use interface B, but interface B shouldn't need to worry about being called by interface A". I think we're respecting this principle so far.
Agreed and agreed.
In my previous refactoring of the FFmpeg CUDA device interface, I believe I had always created the CPU device for fallback. But in that device, I also think we only knew if we would need the fallback when we discovered that FFmpeg had decoded on the frame on the CPU and then we needed to do CPU color conversion. If we could discover earlier in the FFmpeg CUDA device interface, then doing this pattern would be good. But I doubt we can
| } else { | ||
| frameOutput.data = cpuFrameOutput.data.to(device_); | ||
| } | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do this dance in at least one other place in the FFmpeg CUDA device. Maybe a part of #943 could also be trying to pull this into a utils function.
Co-authored-by: Molly Xu <[email protected]> Co-authored-by: Molly Xu <[email protected]>
Currently in our BETA CUDA Interface, we error loudly if we find that NVDEC doesn't support the input video.
In this PR, we do not error loudly anymore: we fall back to the CPU. This aligns the behavior of the BETA Interface with that of the FFmpeg interface, since FFmpeg was doing the fallback internally already (see #943).
There are some key design choices and which are better explain in context, so I'll leave some comments below.